Minimizing the Communication Time for Matrix Multiplication on Multiprocessors

نویسنده

S. Lennart Johnsson

چکیده

We present one matrix multiplication algorithm for two{dimensional arrays of processing nodes, and one algorithm for three{dimensional nodal arrays. One{dimensional nodal arrays are treated as a degenerate case. The algorithms are designed to utilize fully the communications bandwidth in high degree networks in which the one{, two{, or three{dimensional arrays may be embedded. For binary n-cubes, our algorithms ooer a speedup of the communication over previous algorithms for square matrices and square two{dimensional arrays by a factor of n 2. Connguring the N = 2 n processing nodes as a three-dimensional array may reduce the communication complexity by a factor of N 1 6 compared to a two{dimensional nodal array. The three{dimensional algorithm requires temporary storage proportional to the length of the nodal array axis aligned with the axis shared between the multiplier and the multiplicand. The optimal two{dimensional nodal array shape with respect to communication has a ratio between the numbers of node rows and columns equal to the ratio between the numbers of matrix rows and columns of the product matrix, with the product matrix accumulated in{place. The optimal three{dimensional nodal array shape has a ratio between the lengths of the machine axes equal approximately to the ratio between the lengths of the three axes in matrix multiplication. For product matrices of extreme shape, one{dimensional nodal array shapes are optimal when N=n < columns of the product matrix. All our algorithms use standard communication functions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel Matrix Multiplication Algorithms on Hypercube Multiprocessors 1

In this paper, we present three parallel algorithms for matrix multiplication. The rst one, which employs pipelining techniques on a mesh grid, uses only one copy of data matrices. The second one uses multiple copies of data matrices also on a mesh grid. Although data communication operations of the second algorithm are reduced, the requirement of local data memory for each processing element i...

متن کامل

Modified 32-Bit Shift-Add Multiplier Design for Low Power Application

Multiplication is a basic operation in any signal processing application. Multiplication is the most important one among the four arithmetic operations like addition, subtraction, and division. Multipliers are usually hardware intensive, and the main parameters of concern are high speed, low cost, and less VLSI area. The propagation time and power consumption in the multiplier are always high. ...

متن کامل

A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure

The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...

متن کامل

Matrix-Matrix Multiplications and Fault Tolerance on Hypercube Multiprocessors

Several new algorithms for matrix-matrix multiplications on hypercube multiprocessors are presented and evaluated based on the number of multiplications, additions, and transfers. The matrices ~I be multiplied are uniformly distributed to all processors of a hypercube system. Each processor owns some submatrices which are derived by dividing the source matrices. Each submatrix multiplication ca...

متن کامل

Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors

In this paper we present a new technique for sparse matrix multiplication on vector multiprocessors based on the efficient implementation of a segmented sum operation. We describe how the segmented sum can be implemented on vector multiprocessors such that it both fully vectorizes within each processor and parallelizes across processors. Because of our method’s insensitivity to relative row siz...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Parallel Computing

دوره 19 شماره

صفحات -

تاریخ انتشار 1993

Minimizing the Communication Time for Matrix Multiplication on Multiprocessors

نویسنده

چکیده

منابع مشابه

Parallel Matrix Multiplication Algorithms on Hypercube Multiprocessors 1

Modified 32-Bit Shift-Add Multiplier Design for Low Power Application

A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure

Matrix-Matrix Multiplications and Fault Tolerance on Hypercube Multiprocessors

Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors

عنوان ژورنال:

اشتراک گذاری